A Workbench for Finding Structure in Texts

نویسندگان

  • Andrei Mikheev
  • Steven Finch
چکیده

In this paper we report on a set of computational tools with (n)SGML pipeline data flow for uncovering internal structure in natural language texts. The main idea behind the workbench is the independence of the text representation and text analysis phases. At the representation phase the text is converted from a sequence of characters to features of interest by means of the annotation tools. At the analysis phase those features are used by statistics gathering and inference tools for finding significant correlations in the texts. The analysis tools are independent of particular assumptions about the nature of the feature-set and work on the abstract level of featureelements represented as SGML items.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

The Story Workbench: An Extensible Semi-Automatic Text Annotation Tool

Text annotations are of great use to researchers in the language sciences, and much effort has been invested in creating annotated corpora for an wide variety of purposes. Unfortunately, software support for these corpora tends to be quite limited: it is usually ad-hoc, poorly designed and documented, or not released for public use. I describe an annotation tool, the Story Workbench, which prov...

متن کامل

POSBIOTM/W: A Development Workbench for Machine Learning Oriented Biomedical Text Mining System

The POSBIOTM/W1 is a workbench for machine-learning oriented biomedical text mining system. The POSTBIOTM/W is intended to assist biologist in mining useful information efficiently from biomedical text resources. To do so, it provides a suit of tools for gathering, managing, analyzing and annotating texts. The workbench is implemented in Java, which means that it is platform-independent.

متن کامل

فردیت قهرمان در فیلم رنگو

Film is a visional-narrative art which is, unlike its similar narrative and fictional structure with literary texts, different from literary texts in some aspects such as method, technique, context, used substances and its affection on the audience. Rango is one of the greatest animated artifice works in the world which is unlinked and unique both in its form and intent. It was made by Paramoun...

متن کامل

MedLex+: An Integrated Corpus-Lexicon Medical Workbench for Swedish

This paper reports on the work carried out developing MedLex+, a medical corpuslexicon workbench for Swedish. This project, which is still under active development, has been going on for some years now within the Department of Swedish language at Göteborg University. At the moment, the workbench incorporates: an annotated collection of medical texts-including 20 million tokens and 45,000 docume...

متن کامل

Internet as Corpus Automatic Construction of a Swedish News Corpus

This paper describes the automatic building of a corpus of short Swedish news texts from the Internet, its application and possible future use. The corpus is aimed at research on Information Retrieval, Information Extraction, Named Entity Recognition and Multi Text Summarization. The corpus has been constructed by using an Internet agent, the so called newsAgent, downloading Swedish news text f...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 1997